Searching XML Documents - Preliminary Work
نویسندگان
چکیده
Structured document retrieval aims at exploiting the structure together with the content of documents to improve retrieval results. Several aspects of traditional information retrieval applied on flat documents have to be reconsidered. These include in particular, document representation, storage, indexing, retrieval, and ranking. This paper outlines the architecture of our system and the adaptation of the standard vector space model to achieve focussed retrieval.
منابع مشابه
Similarity Metric for XML Documents
Since XML documents can be represented as trees, Based on traditional tree edit distance, this paper presents structural similarity metric for XML documents ,which is based on edge constraint, path constraint, and inclusive path constraint, and similarity metric based on machine learning with node costs. It extends scope for searching XML documents, and improves recall and precision for searchi...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملSemi-automated Xml Tagging of Public Text Archives: a Case Study
Public archives contain large and continuously growing volumes of electronically available text documents. In many countries, public authorities are required by law to publish certain data to satisfy the information needs of the general public. In contrast to plain text documents, semantically tagged XML documents along with appropriate query languages largely facilitate searching and browsing ...
متن کاملSUSAX: Context-Specific Searching in XML Documents Using Sequence Alignment Techniques
Keyword searching while very successful in narrowing down the contents of the Web to the pertaining subset of information, has two primary drawbacks. First, the accuracy of the search is closely coupled with the choice of keywords. Second, keywords are limited in their expressibility. In particular, they fail to adequately capture the “contextual information” implicit in most searches done by u...
متن کاملAn Extension of the Vector Space Model for Querying XML Documents via XML Fragments
To date, most of the work on XML query and search has stemmed from the document management and database communities and from the information needs of business applications, as evidenced by existing XML query languages such as W3C's XQuery, which is strongly inspired by SQL. We propose here to extend the realm of XML by supporting the information needs of users wishing to query XML collections i...
متن کامل